{
 "cells": [
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "# 评估simple rag 中的块大小\n",
    "\n",
    "选择合适的块大小对于提高检索增强生成（RAG）管道的检索准确性至关重要。目标是平衡检索性能与响应质量。\n",
    "\n",
    "-----\n",
    "以下方式评估不同的块大小:\n",
    "- 从 PDF 中提取文本\n",
    "- 将文本分割成不同大小的块\n",
    "- 为每个块创建嵌入\n",
    "- 为查询检索相关块\n",
    "- 使用检索到的块生成响应\n",
    "- 评估响应质量\n",
    "- 比较不同块大小的结果\n",
    "\n",
    "-----\n",
    "实现步骤:\n",
    "- 从 PDF 中提取文本：按页获取页面文本\n",
    "- 将文本分割成不同大小的块，为每个块创建嵌入\n",
    "- 根据查询检索相关块\n",
    "- 使用检索到的文本块用模型生成回答\n",
    "- 评估不同大小块的检索回答质量"
   ],
   "id": "bc32e45ff1860c69"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "# 设置环境",
   "id": "76da514a541b5b4e"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-04-23T07:46:20.204790Z",
     "start_time": "2025-04-23T07:46:19.286042Z"
    }
   },
   "cell_type": "code",
   "source": [
    "import fitz\n",
    "import os\n",
    "import numpy as np\n",
    "import json\n",
    "from openai import OpenAI\n",
    "from dotenv import load_dotenv\n",
    "\n",
    "load_dotenv()"
   ],
   "id": "c9efbf6aee35d4c1",
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "execution_count": 1
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "# 设置 OpenAI API 客户端",
   "id": "91d73cdd12e86615"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-04-23T07:46:20.398270Z",
     "start_time": "2025-04-23T07:46:20.208782Z"
    }
   },
   "cell_type": "code",
   "source": [
    "client = OpenAI(\n",
    "    base_url=os.getenv(\"LLM_BASE_URL\"),\n",
    "    api_key=os.getenv(\"LLM_API_KEY\")\n",
    ")"
   ],
   "id": "327860b638873dfd",
   "outputs": [],
   "execution_count": 2
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "# 从 PDF 中提取文本",
   "id": "dfe2ca5b75d25c6"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-04-23T07:46:20.601287Z",
     "start_time": "2025-04-23T07:46:20.573625Z"
    }
   },
   "cell_type": "code",
   "source": [
    "def extract_text_from_pdf(pdf_path):\n",
    "    \"\"\"\n",
    "    Extracts text from a PDF file.\n",
    "\n",
    "    Args:\n",
    "    pdf_path (str): Path to the PDF file.\n",
    "\n",
    "    Returns:\n",
    "    str: Extracted text from the PDF.\n",
    "    \"\"\"\n",
    "    # Open the PDF file\n",
    "    mypdf = fitz.open(pdf_path)\n",
    "    all_text = \"\"  # Initialize an empty string to store the extracted text\n",
    "\n",
    "    # Iterate through each page in the PDF\n",
    "    for page in mypdf:\n",
    "        # Extract text from the current page and add spacing\n",
    "        all_text += page.get_text(\"text\") + \" \"\n",
    "\n",
    "    # Return the extracted text, stripped of leading/trailing whitespace\n",
    "    return all_text.strip()\n",
    "\n",
    "# Define the path to the PDF file\n",
    "pdf_path = \"../../data/AI_Information.en.zh-CN.pdf\"\n",
    "\n",
    "# Extract text from the PDF file\n",
    "extracted_text = extract_text_from_pdf(pdf_path)\n",
    "\n",
    "# Print the first 500 characters of the extracted text\n",
    "print(extracted_text[:500])"
   ],
   "id": "8258f16a935ec1ad",
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "理解⼈⼯智能\n",
      "第⼀章：⼈⼯智能简介\n",
      "⼈⼯智能 (AI) 是指数字计算机或计算机控制的机器⼈执⾏通常与智能⽣物相关的任务的能⼒。该术\n",
      "语通常⽤于开发具有⼈类特有的智⼒过程的系统，例如推理、发现意义、概括或从过往经验中学习\n",
      "的能⼒。在过去的⼏⼗年中，计算能⼒和数据可⽤性的进步显著加速了⼈⼯智能的开发和部署。\n",
      "历史背景\n",
      "⼈⼯智能的概念已存在数个世纪，经常出现在神话和⼩说中。然⽽，⼈⼯智能研究的正式领域始于\n",
      "20世纪中叶。1956年的达特茅斯研讨会被⼴泛认为是⼈⼯智能的发源地。早期的⼈⼯智能研究侧\n",
      "重于问题解决和符号⽅法。20世纪80年代专家系统兴起，⽽20世纪90年代和21世纪初，机器学习\n",
      "和神经⽹络取得了进步。深度学习的最新突破彻底改变了这⼀领域。\n",
      "现代观察\n",
      "现代⼈⼯智能系统在⽇常⽣活中⽇益普及。从 Siri 和 Alexa 等虚拟助⼿，到流媒体服务和社交媒体\n",
      "上的推荐算法，⼈⼯智能正在影响我们的⽣活、⼯作和互动⽅式。⾃动驾驶汽⻋、先进的医疗诊断\n",
      "技术以及复杂的⾦融建模⼯具的发展，彰显了⼈⼯智能应⽤的⼴泛性和持续增⻓。此外，⼈们对其\n",
      "伦理影响、偏⻅和失业的担忧也⽇益凸显。\n",
      "第⼆章：⼈⼯智能\n"
     ]
    }
   ],
   "execution_count": 3
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "# 对提取的文本进行分块\n",
    "\n",
    "为了提高检索效率，我们将提取的文本分割成不同大小的重叠块"
   ],
   "id": "b39821178e5fccc5"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-04-23T07:46:20.612427Z",
     "start_time": "2025-04-23T07:46:20.607252Z"
    }
   },
   "cell_type": "code",
   "source": [
    "def chunk_text(text, n, overlap):\n",
    "    \"\"\"\n",
    "    将文本分割为重叠的块。\n",
    "\n",
    "    Args:\n",
    "    text (str): 要分割的文本\n",
    "    n (int): 每个块的字符数\n",
    "    overlap (int): 块之间的重叠字符数\n",
    "\n",
    "    Returns:\n",
    "    List[str]: 文本块列表\n",
    "    \"\"\"\n",
    "    chunks = []  #\n",
    "    for i in range(0, len(text), n - overlap):\n",
    "        # 添加从当前索引到索引 + 块大小的文本块\n",
    "        chunks.append(text[i:i + n])\n",
    "\n",
    "    return chunks  # Return the list of text chunks\n",
    "\n",
    "# 定义要评估的不同块大小\n",
    "chunk_sizes = [128, 256, 512]\n",
    "\n",
    "# 创建一个字典，用于存储每个块大小对应的文本块\n",
    "text_chunks_dict = {size: chunk_text(extracted_text, size, size // 5) for size in chunk_sizes}\n",
    "\n",
    "# 打印每个块大小生成的块数量\n",
    "for size, chunks in text_chunks_dict.items():\n",
    "    print(f\"Chunk Size: {size}, Number of Chunks: {len(chunks)}\")"
   ],
   "id": "cba4c8bbeac01298",
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Chunk Size: 128, Number of Chunks: 98\n",
      "Chunk Size: 256, Number of Chunks: 50\n",
      "Chunk Size: 512, Number of Chunks: 25\n"
     ]
    }
   ],
   "execution_count": 4
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "# 为文本片段创建嵌入\n",
    "\n",
    "嵌入将文本转换为数值表示，以进行相似性搜索。"
   ],
   "id": "a323e169e68facb"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-04-23T07:46:24.076781Z",
     "start_time": "2025-04-23T07:46:20.619495Z"
    }
   },
   "cell_type": "code",
   "source": [
    "from tqdm import tqdm\n",
    "import numpy as np\n",
    "import os\n",
    "# 假设client已经被正确初始化和配置\n",
    "\n",
    "def create_embeddings(texts):\n",
    "    \"\"\"\n",
    "    为文本列表生成嵌入\n",
    "\n",
    "    Args:\n",
    "    texts (List[str]): 输入文本列表.\n",
    "\n",
    "    Returns:\n",
    "    List[np.ndarray]: List of numerical embeddings.\n",
    "    \"\"\"\n",
    "    # 确保每次调用不超过64条文本\n",
    "    batch_size = 64\n",
    "    embeddings = []\n",
    "\n",
    "    for i in range(0, len(texts), batch_size):\n",
    "        batch = texts[i:i+batch_size]\n",
    "        response = client.embeddings.create(\n",
    "            model=os.getenv(\"EMBEDDING_MODEL_ID\"),\n",
    "            input=batch\n",
    "        )\n",
    "        # 将响应转换为numpy数组列表并添加到embeddings列表中\n",
    "        embeddings.extend([np.array(embedding.embedding) for embedding in response.data])\n",
    "\n",
    "    return embeddings\n",
    "\n",
    "# 假设text_chunks_dict是一个字典，键是块大小，值是文本块列表\n",
    "chunk_embeddings_dict = {}\n",
    "for size, chunks in tqdm(text_chunks_dict.items(), desc=\"Generating Embeddings\"):\n",
    "    chunk_embeddings_dict[size] = create_embeddings(chunks)\n"
   ],
   "id": "c898cfe828782381",
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Generating Embeddings: 100%|██████████| 3/3 [00:03<00:00,  1.15s/it]\n"
     ]
    }
   ],
   "execution_count": 5
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "# 语义搜索",
   "id": "8cdc239a6d890452"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-04-23T07:46:24.087449Z",
     "start_time": "2025-04-23T07:46:24.084100Z"
    }
   },
   "cell_type": "code",
   "source": [
    "def cosine_similarity(vec1, vec2):\n",
    "    \"\"\"\n",
    "    Computes cosine similarity between two vectors.\n",
    "\n",
    "    Args:\n",
    "    vec1 (np.ndarray): First vector.\n",
    "    vec2 (np.ndarray): Second vector.\n",
    "\n",
    "    Returns:\n",
    "    float: Cosine similarity score.\n",
    "    \"\"\"\n",
    "\n",
    "    # Compute the dot product of the two vectors\n",
    "    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))"
   ],
   "id": "875c073a573cd49e",
   "outputs": [],
   "execution_count": 6
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-04-23T07:46:24.109706Z",
     "start_time": "2025-04-23T07:46:24.104311Z"
    }
   },
   "cell_type": "code",
   "source": [
    "def retrieve_relevant_chunks(query, text_chunks, chunk_embeddings, k=5):\n",
    "    \"\"\"\n",
    "    检索与查询最相关的前k个文本块\n",
    "\n",
    "    Args:\n",
    "    query (str): 用户查询\n",
    "    text_chunks (List[str]): 文本块列表\n",
    "    chunk_embeddings (List[np.ndarray]): 文本块的嵌入列表\n",
    "    k (int): 返回的前k个块数量\n",
    "\n",
    "    Returns:\n",
    "    List[str]: 最相关的文本块列表\n",
    "    \"\"\"\n",
    "    # 为查询生成一个嵌入 - 将查询作为列表传递并获取第一个项目\n",
    "    query_embedding = create_embeddings([query])[0]\n",
    "\n",
    "    # 计算查询嵌入与每个块嵌入之间的余弦相似度\n",
    "    similarities = [cosine_similarity(query_embedding, emb) for emb in chunk_embeddings]\n",
    "\n",
    "    # 获取前k个最相似块的索引\n",
    "    top_indices = np.argsort(similarities)[-k:][::-1]\n",
    "\n",
    "    # 返回前k个最相关的文本块\n",
    "    return [text_chunks[i] for i in top_indices]"
   ],
   "id": "7f78e8a425a5747d",
   "outputs": [],
   "execution_count": 7
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-04-23T07:46:24.363108Z",
     "start_time": "2025-04-23T07:46:24.118397Z"
    }
   },
   "cell_type": "code",
   "source": [
    "# 从 JSON 文件加载验证数据\n",
    "with open('../../data/val.json', encoding=\"utf-8\") as f:\n",
    "    data = json.load(f)\n",
    "\n",
    "# 从验证数据中提取第一个查询\n",
    "query = data[3]['question']\n",
    "\n",
    "# 对于每个块大小，检索相关的文本块\n",
    "retrieved_chunks_dict = {size: retrieve_relevant_chunks(query, text_chunks_dict[size], chunk_embeddings_dict[size]) for size in chunk_sizes}\n",
    "\n",
    "# 打印块大小为 256 的检索到的文本块\n",
    "print(retrieved_chunks_dict[256])"
   ],
   "id": "cdad7cb1b7bfec62",
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['健\\n医疗诊断与治疗\\n⼈⼯智能正在通过分析医学影像、预测患者预后并协助制定治疗计划，彻底改变医学诊断和治疗。\\n⼈⼯智能⼯具能够提⾼准确性、效率和患者护理⽔平。\\n药物研发\\n⼈⼯智能通过分析⽣物数据、预测药物疗效和识别潜在候选药物，加速药物的发现和开发。⼈⼯智\\n能系统缩短了新疗法上市的时间并降低了成本。\\n个性化医疗\\n⼈⼯智能通过分析个体患者数据、预测治疗反应并制定⼲预措施，实现个性化医疗。个性化医疗可\\n提⾼治疗效果并减少不良反应。\\n机器⼈⼿术\\n⼈⼯智能机器⼈⼿术系统能够帮助外科医⽣以更⾼的精度和控制⼒执⾏复杂的⼿', '问、提供指导并跟踪学习进度。这些\\n⼯具扩⼤了教育覆盖⾯，并提升了学习成果。\\n⾃动评分和反馈\\n⼈⼯智能⾃动化评分和反馈流程，节省教育⼯作者的时间，并及时为学⽣提供反馈。⼈⼯智能系统\\n可以评估论⽂、作业和考试，找出需要改进的地⽅。\\n教育数据挖掘\\n教育数据挖掘利⽤⼈⼯智能分析学⽣数据，识别学习模式并预测学习成果。这些信息可以为教学策\\n略提供参考，改进教育项⽬，并增强学⽣⽀持服务。\\n 第 11 章：⼈⼯智能与医疗保健\\n医疗诊断与治疗\\n⼈⼯智能正在通过分析医学影像、预测患者预后并协助制定治疗计划，彻底改变医学诊断和治', '果并减少不良反应。\\n机器⼈⼿术\\n⼈⼯智能机器⼈⼿术系统能够帮助外科医⽣以更⾼的精度和控制⼒执⾏复杂的⼿术。这些系统能够\\n提⾼⼿术灵活性，减少创伤，并改善患者的治疗效果。\\n医疗保健管理\\n⼈⼯智能通过⾃动化任务、管理患者记录和优化⼯作流程来简化医疗保健管理。⼈⼯智能系统可以\\n提⾼效率、降低成本并增强患者体验。\\n第 12 章：⼈⼯智能与⽹络安全\\n威胁检测与预防\\n⼈⼯智能通过检测和预防威胁、分析⽹络流量以及识别漏洞来增强⽹络安全。⼈⼯智能系统可以⾃\\n动执⾏安全任务，提⾼威胁检测的准确性，并增强整体⽹络安全态势。\\n异', '、个性化医疗和机器⼈⼿术等应⽤改变医疗保健。⼈⼯智能\\n⼯具可以分析医学图像、预测患者预后并协助制定治疗计划。\\n⾦融\\n在⾦融领域，⼈⼯智能⽤于欺诈检测、算法交易、⻛险管理和客⼾服务。⼈⼯智能算法可以分析⼤\\n型数据集，以识别模式、预测市场趋势并实现财务流程⾃动化。\\n 运输\\n随着⾃动驾驶汽⻋、交通优化系统和物流管理的发展，⼈⼯智能正在彻底改变交通运输。⾃动驾驶\\n汽⻋利⽤⼈  ⼯智能感知周围环境、做出驾驶决策并安全⾏驶。\\n零售\\n零售⾏业利⽤⼈⼯智能进⾏个性化推荐、库存管理、客服聊天机器⼈和供应链优化。⼈⼯智能系统\\n', '学⽣的个性化需求，提供反馈，并打造定制化的学习体验。\\n娱乐\\n娱乐⾏业将⼈⼯智能⽤于内容推荐、游戏开发和虚拟现实体验。⼈⼯智能算法分析⽤⼾偏好，推荐\\n电影、⾳乐和游戏，从⽽增强⽤⼾参与度。\\n⽹络安全\\n⼈⼯智能在⽹络安全领域⽤于检测和应对威胁、分析⽹络流量以及识别漏洞。⼈⼯智能系统可以⾃\\n动执⾏安全任务，提⾼威胁检测的准确性，并增强整体⽹络安全态势。\\n第四章：⼈⼯智能的伦理和社会影响\\n⼈⼯智能的快速发展和部署引发了重⼤的伦理和社会担忧。这些担忧包括：\\n偏⻅与公平\\n⼈⼯智能系统可能会继承并放⼤其训练数据中存在的偏']\n"
     ]
    }
   ],
   "execution_count": 8
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "# 基于检索到的片段生成响应\n",
    "\n",
    "基于检索到的文本为块大小 256 生成一个响应"
   ],
   "id": "276dda9e70c91966"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-04-23T07:46:36.532406Z",
     "start_time": "2025-04-23T07:46:24.376031Z"
    }
   },
   "cell_type": "code",
   "source": [
    "# AI 助手的系统提示\n",
    "system_prompt = \"你是一个AI助手，严格根据给定的上下文进行回答。如果无法直接从提供的上下文中得出答案，请回复：'我没有足够的信息来回答这个问题。'\"\n",
    "\n",
    "def generate_response(query, system_prompt, retrieved_chunks):\n",
    "    \"\"\"\n",
    "    基于检索到的文本块生成 AI 回答。\n",
    "\n",
    "    Args:\n",
    "    query (str): 用户查询\n",
    "    retrieved_chunks (List[str]): 检索到的文本块列表\n",
    "    model (str): AI model.\n",
    "\n",
    "    Returns:\n",
    "    str: AI-generated response.\n",
    "    \"\"\"\n",
    "    # 将检索到的文本块合并为一个上下文字符串\n",
    "    context = \"\\n\".join([f\"Context {i+1}:\\n{chunk}\" for i, chunk in enumerate(retrieved_chunks)])\n",
    "\n",
    "    # 通过组合上下文和查询创建用户提示\n",
    "    user_prompt = f\"{context}\\n\\nQuestion: {query}\"\n",
    "\n",
    "    # Generate the AI response using the specified model\n",
    "    response = client.chat.completions.create(\n",
    "        model=os.getenv(\"LLM_MODEL_ID\"),\n",
    "        temperature=0,\n",
    "        messages=[\n",
    "            {\"role\": \"system\", \"content\": system_prompt},\n",
    "            {\"role\": \"user\", \"content\": user_prompt}\n",
    "        ]\n",
    "    )\n",
    "\n",
    "    # Return the content of the AI response\n",
    "    return response.choices[0].message.content\n",
    "\n",
    "# 为每个块大小生成 AI 回答\n",
    "ai_responses_dict = {size: generate_response(query, system_prompt, retrieved_chunks_dict[size]) for size in chunk_sizes}\n",
    "\n",
    "# 打印块大小为 256 的回答\n",
    "print(ai_responses_dict[256])"
   ],
   "id": "62498bfcc710e625",
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "人工智能通过以下方式为个性化医疗做出贡献：\n",
      "\n",
      "1. **分析个体患者数据**：人工智能可以处理和分析大量个体患者的健康数据，包括基因组信息、病史、生活方式等，从而深入了解每位患者的独特情况。\n",
      "\n",
      "2. **预测治疗反应**：基于对个体数据的分析，人工智能可以预测患者对不同治疗方案的响应情况，帮助医生选择最有效的治疗方法。\n",
      "\n",
      "3. **制定干预措施**：人工智能能够根据患者的具体情况，制定个性化的治疗计划和干预措施，以提高治疗效果并减少不良反应。\n",
      "\n",
      "这些功能共同提升了个性化医疗的水平，使治疗更加精准和高效。\n"
     ]
    }
   ],
   "execution_count": 9
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "# 评估响应质量\n",
    "\n",
    "\n",
    "根据忠实度和相关性对回复进行评分"
   ],
   "id": "a5d6ae9d5df888de"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-04-23T07:46:36.547579Z",
     "start_time": "2025-04-23T07:46:36.544851Z"
    }
   },
   "cell_type": "code",
   "source": [
    "# 定义评估评分系统的常量\n",
    "SCORE_FULL = 1.0     # 完全匹配或完全令人满意\n",
    "SCORE_PARTIAL = 0.5  # 部分匹配或部分令人满意\n",
    "SCORE_NONE = 0.0     # 无匹配或不令人满意"
   ],
   "id": "fc0841b38bc78526",
   "outputs": [],
   "execution_count": 10
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-04-23T07:46:36.574796Z",
     "start_time": "2025-04-23T07:46:36.572336Z"
    }
   },
   "cell_type": "code",
   "source": [
    "# 定义严格的评估提示模板\n",
    "FAITHFULNESS_PROMPT_TEMPLATE = \"\"\"\n",
    "评估 AI 回答与真实答案的一致性、忠实度。\n",
    "用户查询: {question}\n",
    "AI 回答: {response}\n",
    "真实答案: {true_answer}\n",
    "\n",
    "一致性衡量 AI 回答与真实答案中的事实对齐的程度，且不包含幻觉信息。\n",
    "忠实度衡量的是AI的回答在没有幻觉的情况下与真实答案中的事实保持一致的程度。\n",
    "\n",
    "指示：\n",
    "- 严格使用以下值进行评分：\n",
    "    * {full} = 完全一致，与真实答案无矛盾\n",
    "    * {partial} = 部分一致，存在轻微矛盾\n",
    "    * {none} = 不一致，存在重大矛盾或幻觉信息\n",
    "- 仅返回数值评分（{full}, {partial}, 或 {none}），无需解释或其他附加文本。\n",
    "\"\"\"\n"
   ],
   "id": "f66fcbd48c593e02",
   "outputs": [],
   "execution_count": 11
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-04-23T07:46:36.591284Z",
     "start_time": "2025-04-23T07:46:36.588719Z"
    }
   },
   "cell_type": "code",
   "source": [
    "RELEVANCY_PROMPT_TEMPLATE = \"\"\"\n",
    "评估 AI 回答与用户查询的相关性。\n",
    "用户查询: {question}\n",
    "AI 回答: {response}\n",
    "\n",
    "相关性衡量回答在多大程度上解决了用户的问题。\n",
    "\n",
    "指示：\n",
    "- 严格使用以下值进行评分：\n",
    "    * {full} = 完全相关，直接解决查询\n",
    "    * {partial} = 部分相关，解决了一些方面\n",
    "    * {none} = 不相关，未能解决查询\n",
    "- 仅返回数值评分（{full}, {partial}, 或 {none}），无需解释或其他附加文本。\n",
    "\"\"\""
   ],
   "id": "cfeea3ab4a1ced67",
   "outputs": [],
   "execution_count": 12
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-04-23T07:46:38.348920Z",
     "start_time": "2025-04-23T07:46:36.594330Z"
    }
   },
   "cell_type": "code",
   "source": [
    "def evaluate_response(question, response, true_answer):\n",
    "        \"\"\"\n",
    "        根据忠实度和相关性评估 AI 生成的回答质量\n",
    "\n",
    "        Args:\n",
    "        question (str): 用户的原始问题\n",
    "        response (str): 被评估的 AI 生成的回答\n",
    "        true_answer (str): 作为基准的真实答案\n",
    "\n",
    "        Returns:\n",
    "        Tuple[float, float]: 包含 (忠实度评分, 相关性评分) 的元组。\n",
    "                             每个评分可能是：1.0（完全匹配）、0.5（部分匹配）或 0.0（无匹配）。\n",
    "        \"\"\"\n",
    "        # 格式化评估提示\n",
    "        faithfulness_prompt = FAITHFULNESS_PROMPT_TEMPLATE.format(\n",
    "                question=question,\n",
    "                response=response,\n",
    "                true_answer=true_answer,\n",
    "                full=SCORE_FULL,\n",
    "                partial=SCORE_PARTIAL,\n",
    "                none=SCORE_NONE\n",
    "        )\n",
    "\n",
    "        relevancy_prompt = RELEVANCY_PROMPT_TEMPLATE.format(\n",
    "                question=question,\n",
    "                response=response,\n",
    "                full=SCORE_FULL,\n",
    "                partial=SCORE_PARTIAL,\n",
    "                none=SCORE_NONE\n",
    "        )\n",
    "\n",
    "        # 模型进行忠实度评估\n",
    "        faithfulness_response = client.chat.completions.create(\n",
    "               model=os.getenv(\"LLM_MODEL_ID\"),\n",
    "                temperature=0,\n",
    "                messages=[\n",
    "                        {\"role\": \"system\", \"content\": \"你是一个客观的评估者，仅返回数值评分。\"},\n",
    "                        {\"role\": \"user\", \"content\": faithfulness_prompt}\n",
    "                ]\n",
    "        )\n",
    "\n",
    "        # 模型进行相关性评估\n",
    "        relevancy_response = client.chat.completions.create(\n",
    "                model=os.getenv(\"LLM_MODEL_ID\"),\n",
    "                temperature=0,\n",
    "                messages=[\n",
    "                        {\"role\": \"system\", \"content\": \"你是一个客观的评估者，仅返回数值评分。\"},\n",
    "                        {\"role\": \"user\", \"content\": relevancy_prompt}\n",
    "                ]\n",
    "        )\n",
    "\n",
    "        # 提取评分并处理潜在的解析错误\n",
    "        try:\n",
    "                faithfulness_score = float(faithfulness_response.choices[0].message.content.strip())\n",
    "        except ValueError:\n",
    "                print(\"Warning: 无法解析忠实度评分，将默认为 0\")\n",
    "                faithfulness_score = 0.0\n",
    "\n",
    "        try:\n",
    "                relevancy_score = float(relevancy_response.choices[0].message.content.strip())\n",
    "        except ValueError:\n",
    "                print(\"Warning: 无法解析相关性评分，将默认为 0\")\n",
    "                relevancy_score = 0.0\n",
    "\n",
    "        return faithfulness_score, relevancy_score\n",
    "\n",
    "# 第一条验证数据的真实答案\n",
    "true_answer = data[3]['ideal_answer']\n",
    "\n",
    "# 评估块大小为 256 和 128 的回答\n",
    "faithfulness, relevancy = evaluate_response(query, ai_responses_dict[256], true_answer)\n",
    "faithfulness2, relevancy2 = evaluate_response(query, ai_responses_dict[128], true_answer)\n",
    "\n",
    "# 打印评估分数\n",
    "print(f\"忠实度评分 (Chunk Size 256): {faithfulness}\")\n",
    "print(f\"相关性评分 (Chunk Size 256): {relevancy}\")\n",
    "\n",
    "print(f\"\\n\")\n",
    "\n",
    "print(f\"忠实度评分 (Chunk Size 128): {faithfulness2}\")\n",
    "print(f\"忠实度评分 (Chunk Size 128): {relevancy2}\")"
   ],
   "id": "87d1cfc65c0803cc",
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "忠实度评分 (Chunk Size 256): 1.0\n",
      "相关性评分 (Chunk Size 256): 1.0\n",
      "\n",
      "\n",
      "忠实度评分 (Chunk Size 128): 1.0\n",
      "忠实度评分 (Chunk Size 128): 1.0\n"
     ]
    }
   ],
   "execution_count": 13
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
